AITopics | channel-wise attention

Collaborating Authors

channel-wise attention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Controllable Text-to-Image Generation

Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip Torr

Neural Information Processing SystemsFeb-11-2026, 15:36:35 GMT

Also, a word-level discriminator is proposed to providefine-grained supervisory feedback bycorrelating wordswithimageregions, facilitating training an effective generator which is able to manipulate specific visual attributes without affecting the generation of other content. Furthermore, perceptual loss is adopted to reduce the randomness involved in the image generation, andtoencourage thegenerator tomanipulate specific attributesrequired inthemodified text.

artificial intelligence, arxivpreprintarxiv, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

1d72310edc006dadf2190caad5802983-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 15:36:20 GMT

channel-wise attention, effectiveness, word-level discriminator, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.32)
Information Technology > Artificial Intelligence > Vision (0.31)

Add feedback

Controllable Text-to-Image Generation

Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, Philip Torr

Neural Information Processing SystemsOct-9-2025, 13:23:53 GMT

When the given text description (e.g., colour) is changed,

artificial intelligence, discriminator, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

1d72310edc006dadf2190caad5802983-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 07:43:18 GMT

artificial intelligence, channel-wise attention, word-level discriminator, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.32)
Information Technology > Artificial Intelligence > Vision (0.31)

Add feedback

d840cc5d906c3e9c84374c8919d2074e-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 04:47:15 GMT

detection, nc block, se-net, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.32)

Add feedback

Controllable diffusion-based generation for multi-channel biological data

Zhang, Haoran, Zhou, Mingyuan, Tansey, Wesley

arXiv.org Artificial IntelligenceJul-8-2025

Spatial profiling technologies in biology, such as imaging mass cytometry (IMC) and spatial transcriptomics (ST), generate high-dimensional, multi-channel data with strong spatial alignment and complex inter-channel relationships. Generative modeling of such data requires jointly capturing intra- and inter-channel structure, while also generalizing across arbitrary combinations of observed and missing channels for practical application. Existing diffusion-based models generally assume low-dimensional inputs (e.g., RGB images) and rely on simple conditioning mechanisms that break spatial correspondence and ignore inter-channel dependencies. This work proposes a unified diffusion framework for controllable generation over structured and spatial biological data. Our model contains two key innovations: (1) a hierarchical feature injection mechanism that enables multi-resolution conditioning on spatially aligned channels, and (2) a combination of latent-space and output-space channel-wise attention to capture inter-channel relationships. To support flexible conditioning and generalization to arbitrary subsets of observed channels, we train the model using a random masking strategy, enabling it to reconstruct missing channels from any combination of inputs. We demonstrate state-of-the-art performance across both spatial and non-spatial prediction tasks, including protein imputation in IMC and gene-to-protein prediction in single-cell datasets, and show strong generalization to unseen conditional configurations.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.02902

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.89)
Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

Reviews: Controllable Text-to-Image Generation

Neural Information Processing SystemsJan-22-2025, 03:35:11 GMT

The paper is well-organized and written, which can be followed easily. In particular, instead of generating a new image from the text, the authors pay more attention to image manipulation based on the modified natural language description. For the word-level spatial and channel-wise attention driven generator: (1) The novelty and effectiveness of attentional generator may be limited. Specifically, the paper designs a word-level spatial and channel-wise attention driven generator, which has two attention parts (i.e. However, since the spatial attention is based on the method in AttnGAN [7], most contributions may lie on the additional channel-wise part.

channel-wise attention, controllable text-to-image generation, mechanism, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.85)
Information Technology > Sensing and Signal Processing > Image Processing (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Improving Transformer-based Networks With Locality For Automatic Speaker Verification

Sang, Mufan, Zhao, Yong, Liu, Gang, Hansen, John H. L., Wu, Jian

arXiv.org Artificial IntelligenceFeb-28-2023

Recently, Transformer-based architectures have been explored for speaker embedding extraction. Although the Transformer employs the self-attention mechanism to efficiently model the global interaction between token embeddings, it is inadequate for capturing short-range local context, which is essential for the accurate extraction of speaker information. In this study, we enhance the Transformer with the enhanced locality modeling in two directions. First, we propose the Locality-Enhanced Conformer (LE-Confomer) by introducing depth-wise convolution and channel-wise attention into the Conformer blocks. Second, we present the Speaker Swin Transformer (SST) by adapting the Swin Transformer, originally proposed for vision tasks, into speaker embedding network. We evaluate the proposed approaches on the VoxCeleb datasets and a large-scale Microsoft internal multilingual (MS-internal) dataset. The proposed models achieve 0.75% EER on VoxCeleb 1 test set, outperforming the previously proposed Transformer-based models and CNN-based models, such as ResNet34 and ECAPA-TDNN. When trained on the MS-internal dataset, the proposed models achieve promising results with 14.6% relative reduction in EER over the Res2Net50 model.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.08639

Country:

North America > United States > Washington > King County > Redmond (0.04)
North America > United States > Texas > Dallas County > Dallas (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression

Zhao, Sicheng, Jia, Zizhou, Chen, Hui, Li, Leida, Ding, Guiguang, Keutzer, Kurt

arXiv.org Artificial IntelligenceSep-11-2019

Existing methods on visual emotion analysis mainly focus on coarse-grained emotion classification, i.e. assigning an image with a dominant discrete emotion category. However, these methods cannot well reflect the complexity and subtlety of emotions. In this paper, we study the fine-grained regression problem of visual emotions based on convolutional neural networks (CNNs). Specifically, we develop a Polarity-consistent Deep Attention Network (PDANet), a novel network architecture that integrates attention into a CNN with an emotion polarity constraint. First, we propose to incorporate both spatial and channel-wise attentions into a CNN for visual emotion regression, which jointly considers the local spatial connectivity patterns along each channel and the interdependency between different channels. Second, we design a novel regression loss, i.e. polarity-consistent regression (PCR) loss, based on the weakly supervised emotion polarity to guide the attention generation. By optimizing the PCR loss, PDANet can generate a polarity preserved attention map and thus improve the emotion regression performance. Extensive experiments are conducted on the IAPS, NAPS, and EMOTIC datasets, and the results demonstrate that the proposed PDANet outperforms the state-of-the-art approaches by a large margin for fine-grained visual emotion regression. Our source code is released at: https://github.com/ZizhouJia/PDANet.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3343031.3351062

1909.05693

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology: